i.cca: use 0 based array indexing #3239

marisn · 2023-11-15T07:23:49Z

When ccmath library was introduced in 49b8066 most of arrays were moved to 0 based indexing but some locations got missed. Most likely it is the source of a crash reported a few years a go: https://trac.osgeo.org/grass/ticket/2297

With this fix I'm able to run the code without a segfault, but I haven't figured out how to get any output and thus no tests are provided.

When ccmath library was introduced in 49b8066 most of arrays were moved to 0 based indexing but some locations got missed. Most likely it is the source of a crash reported a few years a go: https://trac.osgeo.org/grass/ticket/2297

marisn · 2023-11-16T19:22:15Z

Although this PR fixes segfault, the code does not produce correct output (in comparison to pre- 49b8066 version). Unless the testcase passes, this PR does not improve on the current state.

metzm · 2023-12-03T17:10:57Z

imagery/i.cca/main.c

-                cov[i][j][k] = cov[i][k][j] = sigs.sig[i - 1].var[j - 1][k - 1];
+        for (j = 0; j < bands; j++) {
+            mu[i][j] = sigs.sig[i].mean[j];
+            for (k = 0; k < j; k++) {


try

Suggested change

for (k = 0; k < j; k++) {

for (k = 0; k <= j; k++) {

because j is not the upper bound of k to be excluded but to be included, also with zero-based indexing.

Thanks for spotting my mistake!

Still even with this being fixed, output is not identical to the old version. Should it be identical? Suggestions how to test?

The functions in stats.c assume zero-based indices, thus I would expect different results in the sense that results with your PR (zero based indices) are correct whereas previous results (one based indices) were not correct, particularly because the indexing to cov[][][] did not change.

You would need some independent, trusted alternative for Canonical components analysis in order to decide which results are correct. Regarding the code, your PR seems like a bugfix, previous results are probably not correct.

Just for clarification, cov has been indexed previously in main.c as cov[i][j][k] which was wrong for one-based i, j, k because zero-based i, j, k are used in stats.c. With your PR, cov is consistently indexed with zero-based i, j, k, that could explain the differences.

imagery/i.cca/testsuite/test_i_cca.py

neteler · 2024-12-03T16:30:48Z

Testing this PR I see that the component numbers in the warning differ from the generated output:

# GRASS nc_spm_08_grass7/landsat:~ >
i.cca group=landsat_group subgroup=landsat_subgroup    \
       ignature=landsat_signatures output=cca_component
Transform completed.
WARNING: The output cell map <cca_component.0> has values outside the 0-255
         range.
WARNING: The output cell map <cca_component.1> has values outside the 0-255
         range.
WARNING: The output cell map <cca_component.2> has values outside the 0-255
         range.

g.list raster pattern="cca_component*"
cca_component.1
cca_component.2
cca_component.3
cca_component.4

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

neteler · 2024-12-03T17:47:48Z

imagery/i.cca/testsuite/test_i_cca.py

+        cls.runModule("g.region", raster="lsat7_2000_20")
+        cls.group_name = tempname(10)
+        cls.subgroup_name = "vis"
+        cls.runModule(
+            "i.group",
+            group=cls.group_name,
+            subgroup=cls.subgroup_name,
+            input="lsat7_2000_20,lsat7_2000_30,lsat7_2000_40",
+        )


Suggested change

cls.runModule("g.region", raster="lsat7_2000_20")

cls.group_name = tempname(10)

cls.subgroup_name = "vis"

cls.runModule(

"i.group",

group=cls.group_name,

subgroup=cls.subgroup_name,

input="lsat7_2000_20,lsat7_2000_30,lsat7_2000_40",

)

cls.runModule("g.region", raster="lsat7_2002_20")

cls.group_name = tempname(10)

cls.subgroup_name = "vis"

cls.runModule(

"i.group",

group=cls.group_name,

subgroup=cls.subgroup_name,

input="lsat7_2002_20,lsat7_2002_30,lsat7_2002_40",

)

There is an CI error: "b'ERROR: Raster map <lsat7_2000_20> not found\n'" - the available Landsat datasets are of 2002. Updated accordingly.

This will probably affect the expected results, @marisn?

Markus, I have had no time (and am not certain if my math level is up to task) to go over formulas from the original paper (or was it a technical document?) to create a synthetic dataset with known results as I did for the r.kappa module. This is the reason why this PR has not been merged as at the moment is possible only to test if module runs but not if it produces correct output.

Sure, funny understood.
So I'd suggest to commit the data name change and see what happens next.

So, it went further, till here:

Running ./imagery/i.cca/testsuite/test_i_cca.py... ======================================================================== F ====================================================================== FAIL: test_output_values (__main__.OutputMatchTest.test_output_values) Test correctness of i.cca output ---------------------------------------------------------------------- Traceback (most recent call last): File "imagery/i.cca/testsuite/test_i_cca.py", line 83, in test_output_values self.assertRasterMinMax( File "etc/python/grass/gunittest/case.py", line 539, in assertRasterMinMax self.fail(self._formatMessage(msg, stdmsg)) AssertionError: Wrong calculated value The actual maximum (14) is greater than the reference one (-3) for raster map tmp_oy5WA6oFO0.2 (with minimum -17)

neteler · 2024-12-04T21:59:43Z

Just for fun, here an attempt for a synthetic i.cca test case (surely to be improved to become a real test case):

Minimalistic i.cca Example with Synthetic Data

Set a simple region for your synthetic raster maps, creating a 10x10 grid with a resolution of 10 units:

g.region n=100 s=0 e=100 w=0 res=10 -p

Generate synthetic raster maps by using r.mapcalc to create input raster maps with varying patterns:

# map1: Random values between 0 and 100
r.mapcalc "map1 = rand(0, 100)" -s
# map2: X-coordinate values with added noise
r.mapcalc "map2 = x() + rand(0, 10)" -s
map3: Y-coordinate values with added noise
r.mapcalc "map3 = y() + rand(0, 10)" -s

r.info -r map=map1
r.info -r map=map2
r.info -r map=map3

d.mon wx0
d.rast map1
d.rast map2
d.rast map3

Create an imagery group and subgroup with these synthetic raster maps:

i.group group=synthetic_group subgroup=synthetic_subgroup \
        input=map1,map2,map3

Create synthetic training data by generating a simple vector map with labeled polygons for training. This creates:

Random training points (training_points)
Buffer zones around points (training_areas)
Three classes (class = 1 .. 3)

v.random output=training_points n=5 seed=1
v.db.addtable map=training_points columns="class INTEGER"
v.db.update map=training_points column=class value=cat
v.db.select map=training_points

v.buffer input=training_points output=training_areas distance=20 -t
v.db.select map=training_areas

v.db.update map=training_areas column=class value=1 where="cat <= 2"
v.db.update map=training_areas column=class value=2 where="cat == 3"
v.db.update map=training_areas column=class value=3 where="cat > 3"
v.db.select map=training_areas
d.vect training_areas -c

Rasterize the training areas:

v.to.rast input=training_areas output=training_areas use=attr attribute_column=class
r.info map=training_areas -r
r.category map=training_areas

d.erase
d.rast training_areas

Next generate a signature file from the synthetic training data:

i.gensig trainingmap=training_areas group=synthetic_group subgroup=synthetic_subgroup \
         signaturefile=synthetic_signatures

Perform Canonical Components Analysis with i.cca:

i.cca group=synthetic_group subgroup=synthetic_subgroup \
      signature=synthetic_signatures output=cca_comp

Display the results:

d.rast map=cca_comp.1
d.rast map=cca_comp.2
d.rast map=cca_comp.3

Check statistics of the output raster maps:

r.univar map=cca_comp.1
r.univar map=cca_comp.2
r.univar map=cca_comp.3

d.erase
d.rgb b=cca_comp.1 g=cca_comp.2 r=cca_comp.3

marisn · 2024-12-05T10:55:54Z

Markus, this still only tests if module runs and not if module runs correctly. The proper solution is to open http://dbwww.essc.psu.edu/lasdoc/user/canal.html, come up with some numbers, plug them into provided formulas and get results. Then they can be used to create a test case that validates GRASS code output.

i.cca: use 0 based array indexing

f698757

When ccmath library was introduced in 49b8066 most of arrays were moved to 0 based indexing but some locations got missed. Most likely it is the source of a crash reported a few years a go: https://trac.osgeo.org/grass/ticket/2297

marisn added the C Related code is in C label Nov 15, 2023

neteler added the bug Something isn't working label Nov 15, 2023

neteler added this to the 8.4.0 milestone Nov 15, 2023

neteler added the backport to 8.3 label Nov 15, 2023

marisn marked this pull request as draft November 16, 2023 18:37

i.cca: add a test for i.cca output correctness

27074be

landam assigned marisn Nov 19, 2023

metzm reviewed Dec 3, 2023

View reviewed changes

marisn added 2 commits December 5, 2023 19:19

Merge branch 'main' into i_cca_fix

b71d67e

i.cca use all signature var values (thanks @metzm for spotting the bug)

19f8bcd

wenzeslaus added the errata Include in errata (fixes wrongly computed results) label Jan 5, 2024

wenzeslaus modified the milestones: 8.4.0, Future Apr 27, 2024

neteler removed the backport to 8.3 label Jun 14, 2024

Merge branch 'main' into i_cca_fix

982a052

github-actions bot added Python Related code is in Python module imagery tests Related to Test Suite labels Nov 28, 2024

github-actions bot reviewed Nov 28, 2024

View reviewed changes

imagery/i.cca/testsuite/test_i_cca.py Show resolved Hide resolved

black

fdf74ae

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

neteler added the backport to 8.4 PR needs to be backported to release branch 8.4 label Dec 3, 2024

neteler reviewed Dec 3, 2024

View reviewed changes

fix landsat map names for CI

5968b5b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i.cca: use 0 based array indexing #3239

i.cca: use 0 based array indexing #3239

marisn commented Nov 15, 2023

marisn commented Nov 16, 2023

metzm Dec 3, 2023

marisn Dec 5, 2023

metzm Dec 5, 2023 •

edited

Loading

metzm Dec 5, 2023

neteler commented Dec 3, 2024

neteler Dec 3, 2024 •

edited

Loading

neteler Dec 3, 2024

marisn Dec 3, 2024

neteler Dec 3, 2024

neteler Dec 4, 2024

neteler commented Dec 4, 2024

marisn commented Dec 5, 2024

i.cca: use 0 based array indexing #3239

Are you sure you want to change the base?

i.cca: use 0 based array indexing #3239

Conversation

marisn commented Nov 15, 2023

marisn commented Nov 16, 2023

metzm Dec 3, 2023

Choose a reason for hiding this comment

marisn Dec 5, 2023

Choose a reason for hiding this comment

metzm Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

metzm Dec 5, 2023

Choose a reason for hiding this comment

neteler commented Dec 3, 2024

neteler Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

neteler Dec 3, 2024

Choose a reason for hiding this comment

marisn Dec 3, 2024

Choose a reason for hiding this comment

neteler Dec 3, 2024

Choose a reason for hiding this comment

neteler Dec 4, 2024

Choose a reason for hiding this comment

neteler commented Dec 4, 2024

Minimalistic i.cca Example with Synthetic Data

marisn commented Dec 5, 2024

metzm Dec 5, 2023 •

edited

Loading

neteler Dec 3, 2024 •

edited

Loading